{
  "cells": [
    {
      "cell_type": "markdown",
      "id": "b6157254",
      "metadata": {
        "id": "b6157254"
      },
      "source": [
        "Copyright (C) 2021-2022, Intel Corporation\n",
        "\n",
        "SPDX-License-Identifier: Apache-2.0\n",
        "\n",
        "Major Portions of this code are copyright of their respective authors and released under the Apache License Version 2.0:\n",
        "- onnx, Copyright 2021-2022. For licensing see https://github.com/onnx/models/blob/master/LICENSE"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "dfed77f6",
      "metadata": {
        "id": "dfed77f6"
      },
      "source": [
        "# Object detection with YOLOv4 in Python using OpenVINO™ Execution Provider:\n",
        "\n",
        "1. The Object detection sample uses a YOLOv4 Deep Learning ONNX Model from the ONNX Model Zoo.\n",
        "\n",
        "\n",
        "2. The sample involves presenting an image to ONNX Runtime (RT), which uses the OpenVINO™ Execution Provider to run inference on various Intel hardware devices as mentioned before and perform object detection to detect up to 80 different objects like person, bicycle, car, motorbike and much more from the coco dataset.\n",
        "\n",
        "\n",
        "4. Once the inferencing is done on the sample, the recording of the same also gets downloaded on the disk.\n",
        "\n",
        "The source code for this sample is available [here](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/python/OpenVINO_EP/yolov4_object_detection)."
      ]
    },
    {
      "cell_type": "markdown",
      "id": "08dd644b",
      "metadata": {
        "id": "08dd644b"
      },
      "source": [
        "## Install Requirements"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "d18b13f8",
      "metadata": {
        "id": "d18b13f8"
      },
      "outputs": [],
      "source": [
        "!pip -q install --upgrade pip\n",
        "!pip -q install folium==0.2.1\n",
        "!pip -q install imgaug==0.2.6\n",
        "!pip -q install certifi==2022.5.18.1\n",
        "!pip -q install flatbuffers==2.0\n",
        "!pip -q install numpy==1.21.6\n",
        "!pip -q install onnx==1.11.0\n",
        "!pip -q install opencv-python==4.5.5.64\n",
        "!pip -q install scipy==1.7.3\n",
        "!pip -q install typing-extensions==4.1.1\n",
        "!pip -q install onnxruntime-openvino"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "97b43517",
      "metadata": {
        "id": "97b43517"
      },
      "source": [
        "## Import Necessary Resources"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "06698673",
      "metadata": {
        "id": "06698673"
      },
      "outputs": [],
      "source": [
        "import cv2\n",
        "import numpy as np\n",
        "from onnx import numpy_helper\n",
        "import onnx\n",
        "import onnxruntime as rt\n",
        "import os\n",
        "from PIL import Image\n",
        "from scipy import special\n",
        "import colorsys\n",
        "import random\n",
        "import argparse\n",
        "import sys\n",
        "import time\n",
        "from google.colab.patches import cv2_imshow"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "3c7fe112",
      "metadata": {
        "id": "3c7fe112"
      },
      "source": [
        "## Get the model and input"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "MtdBsnOeJJkk",
      "metadata": {
        "id": "MtdBsnOeJJkk"
      },
      "outputs": [],
      "source": [
        "files = os.listdir('.')\n",
        "if ('yolov4_anchors.txt' not in files):\n",
        "  !wget https://raw.githubusercontent.com/microsoft/onnxruntime-inference-examples/main/python/OpenVINO_EP/yolov4_object_detection/yolov4_anchors.txt\n",
        "if ('coco.names' not in files):\n",
        "  !wget https://raw.githubusercontent.com/microsoft/onnxruntime-inference-examples/main/python/OpenVINO_EP/yolov4_object_detection/coco.names\n",
        "if ('yolov4.onnx' not in files):\n",
        "  !wget https://github.com/onnx/models/blob/main/vision/object_detection_segmentation/yolov4/model/yolov4.onnx?raw=true -O yolov4.onnx\n",
        "if ('cat.jpg' not in files):\n",
        "  !wget https://storage.openvinotoolkit.org/data/test_data/images/cat.jpg"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "71fec400",
      "metadata": {
        "id": "71fec400"
      },
      "source": [
        "## Preprocess"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "23e66f38",
      "metadata": {
        "id": "23e66f38"
      },
      "source": [
        "### Reshape the input to align with the model"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "a4ba63cf",
      "metadata": {
        "id": "a4ba63cf"
      },
      "source": [
        "\n",
        "When we are using a pre-trained model, which is trained & fine-tuned using a fixed image size as input, we should resize our image to a shape which is expected by the model. The image reshaped using a scaling factor which is a ratio between the desired height/width and the actual image height/width.  \n",
        "$$scale = min \\biggl( \\frac{\\text{target height}}{\\text{input image height}}, \\frac{\\text{target width}}{\\text{input image width}} \\biggl)$$  \n",
        "Using the $scale$-factor, image height & width are calculated which is then re-shaped to the desired image size using the `opencv` package. Here this is acheived by the `image_preprocess` helper function.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "1bab58d5",
      "metadata": {
        "id": "1bab58d5"
      },
      "outputs": [],
      "source": [
        "def image_preprocess(image, target_size, gt_boxes=None):\n",
        "    \n",
        "    ih, iw = target_size\n",
        "    h, w, _ = image.shape\n",
        "\n",
        "    scale = min(iw/w, ih/h)\n",
        "\n",
        "    nw, nh = int(scale * w), int(scale * h)\n",
        "    image_resized = cv2.resize(image, (nw, nh))\n",
        "\n",
        "    image_padded = np.full(shape=[ih, iw, 3], fill_value=128.0)\n",
        "    dw, dh = (iw - nw) // 2, (ih-nh) // 2\n",
        "    image_padded[dh:nh+dh, dw:nw+dw, :] = image_resized\n",
        "    image_padded = image_padded / 255.\n",
        "\n",
        "    if gt_boxes is None:\n",
        "        return image_padded\n",
        "\n",
        "    else:\n",
        "        gt_boxes[:, [0, 2]] = gt_boxes[:, [0, 2]] * scale + dw\n",
        "        gt_boxes[:, [1, 3]] = gt_boxes[:, [1, 3]] * scale + dh\n",
        "        return image_padded, gt_boxes"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "2d86e239",
      "metadata": {
        "id": "2d86e239"
      },
      "source": [
        "### Check file paths\n",
        "\n",
        "`check_model_extention` is a helper function which checks if the model is present in the location specified.  \n",
        "It also validates the model by checking the model file extension. The expected model file should be of `<model_name>.onnx` format."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "2ad33e3d",
      "metadata": {
        "id": "2ad33e3d"
      },
      "outputs": [],
      "source": [
        "def check_model_extension(fp):\n",
        "  # Split the extension from the path and normalise it to lowercase.\n",
        "  ext = os.path.splitext(fp)[-1].lower()\n",
        "\n",
        "  # Now we can simply use != to check for inequality, no need for wildcards.\n",
        "  if(ext != \".onnx\"):\n",
        "    raise Exception(fp, \"is an unknown file format. Use the model ending with .onnx format\")\n",
        "  \n",
        "  if not os.path.exists(fp):\n",
        "    raise Exception(\"[ ERROR ] Path of the onnx model file is Invalid\")"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "38c0ebb9",
      "metadata": {
        "id": "38c0ebb9"
      },
      "source": [
        "## Postprocess"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "9f7fe3af",
      "metadata": {
        "id": "9f7fe3af"
      },
      "source": [
        "### Defines anchor boxes"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "53492b22",
      "metadata": {
        "id": "53492b22"
      },
      "source": [
        "\n",
        "Anchor boxes are a set of predefined bounding boxes of a certain height and width. These boxes are defined to capture the scale and aspect ratio of specific object classes we want to detect and are typically chosen based on object sizes in the training datasets. The use of anchor boxes improves the speed and efficiency for the detection portion of a deep learning neural network framework. Anchor boxes, facilitates the evaluation of object predictions at once, making real-time object detection systems possible.\n",
        "\n",
        "The following function takes the anchor box prediction probabilities and refines it corresponding to the tiled anchor boxes of the input image."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "f3478f30",
      "metadata": {
        "id": "f3478f30"
      },
      "outputs": [],
      "source": [
        "def postprocess_bbbox(pred_bbox):\n",
        "    for i, pred in enumerate(pred_bbox):\n",
        "        conv_shape = pred.shape\n",
        "        output_size = conv_shape[1]\n",
        "        conv_raw_dxdy = pred[:, :, :, :, 0:2]\n",
        "        conv_raw_dwdh = pred[:, :, :, :, 2:4]\n",
        "        xy_grid = np.meshgrid(np.arange(output_size), np.arange(output_size))\n",
        "        xy_grid = np.expand_dims(np.stack(xy_grid, axis=-1), axis=2)\n",
        "\n",
        "        xy_grid = np.tile(np.expand_dims(xy_grid, axis=0), [1, 1, 1, 3, 1])\n",
        "        xy_grid = xy_grid.astype(float)\n",
        "\n",
        "        pred_xy = ((special.expit(conv_raw_dxdy) * XYSCALE[i]) - 0.5 * (XYSCALE[i] - 1) + xy_grid) * STRIDES[i]\n",
        "        pred_wh = (np.exp(conv_raw_dwdh) * ANCHORS[i])\n",
        "        pred[:, :, :, :, 0:4] = np.concatenate([pred_xy, pred_wh], axis=-1)\n",
        "\n",
        "    pred_bbox = [np.reshape(x, (-1, np.shape(x)[-1])) for x in pred_bbox]\n",
        "    pred_bbox = np.concatenate(pred_bbox, axis=0)\n",
        "    return pred_bbox"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "176a4ec6",
      "metadata": {
        "id": "176a4ec6"
      },
      "source": [
        "### Removes boundary boxs with a low detection probability"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "6dca198c",
      "metadata": {
        "id": "6dca198c"
      },
      "source": [
        "The following function takes input as the prediction boxes (obtained by the previous function) and processes them to create bounding boxes. It also gets rid of certain prediction boxes based on a score threshold value."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "3c82a10a",
      "metadata": {
        "id": "3c82a10a"
      },
      "outputs": [],
      "source": [
        "def postprocess_boxes(pred_bbox, org_img_shape, input_size, score_threshold):\n",
        "    valid_scale=[0, np.inf]\n",
        "    pred_bbox = np.array(pred_bbox)\n",
        "\n",
        "    pred_xywh = pred_bbox[:, 0:4]\n",
        "    pred_conf = pred_bbox[:, 4]\n",
        "    pred_prob = pred_bbox[:, 5:]\n",
        "\n",
        "    # # (1) (x, y, w, h) --> (xmin, ymin, xmax, ymax)\n",
        "    pred_coor = np.concatenate([pred_xywh[:, :2] - pred_xywh[:, 2:] * 0.5,\n",
        "                                pred_xywh[:, :2] + pred_xywh[:, 2:] * 0.5], axis=-1)\n",
        "    # # (2) (xmin, ymin, xmax, ymax) -> (xmin_org, ymin_org, xmax_org, ymax_org)\n",
        "    org_h, org_w = org_img_shape\n",
        "    resize_ratio = min(input_size / org_w, input_size / org_h)\n",
        "\n",
        "    dw = (input_size - resize_ratio * org_w) / 2\n",
        "    dh = (input_size - resize_ratio * org_h) / 2\n",
        "\n",
        "    pred_coor[:, 0::2] = 1.0 * (pred_coor[:, 0::2] - dw) / resize_ratio\n",
        "    pred_coor[:, 1::2] = 1.0 * (pred_coor[:, 1::2] - dh) / resize_ratio\n",
        "\n",
        "    # # (3) clip some boxes that are out of range\n",
        "    pred_coor = np.concatenate([np.maximum(pred_coor[:, :2], [0, 0]),\n",
        "                                np.minimum(pred_coor[:, 2:], [org_w - 1, org_h - 1])], axis=-1)\n",
        "    invalid_mask = np.logical_or((pred_coor[:, 0] > pred_coor[:, 2]), (pred_coor[:, 1] > pred_coor[:, 3]))\n",
        "    pred_coor[invalid_mask] = 0\n",
        "\n",
        "    # # (4) discard some invalid boxes\n",
        "    bboxes_scale = np.sqrt(np.multiply.reduce(pred_coor[:, 2:4] - pred_coor[:, 0:2], axis=-1))\n",
        "    scale_mask = np.logical_and((valid_scale[0] < bboxes_scale), (bboxes_scale < valid_scale[1]))\n",
        "\n",
        "    # # (5) discard some boxes with low scores\n",
        "    classes = np.argmax(pred_prob, axis=-1)\n",
        "    scores = pred_conf * pred_prob[np.arange(len(pred_coor)), classes]\n",
        "    score_mask = scores > score_threshold\n",
        "    mask = np.logical_and(scale_mask, score_mask)\n",
        "    coors, scores, classes = pred_coor[mask], scores[mask], classes[mask]\n",
        "\n",
        "    return np.concatenate([coors, scores[:, np.newaxis], classes[:, np.newaxis]], axis=-1)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "050c9844",
      "metadata": {
        "id": "050c9844"
      },
      "source": [
        "### Calculate the Intersection Over Union value"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "72b703d4",
      "metadata": {
        "id": "72b703d4"
      },
      "source": [
        "Intersection over Union (IoU) is an evaluation technique for understanding how well the model is performing. However, during the time of inference, it is used to suppress the bounding-boxes that have a high IoU value with the bounding box with maximum probability."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "d6df6791",
      "metadata": {
        "id": "d6df6791"
      },
      "outputs": [],
      "source": [
        "def bboxes_iou(boxes1, boxes2):\n",
        "    boxes1 = np.array(boxes1)\n",
        "    boxes2 = np.array(boxes2)\n",
        "\n",
        "    boxes1_area = (boxes1[..., 2] - boxes1[..., 0]) * (boxes1[..., 3] - boxes1[..., 1])\n",
        "    boxes2_area = (boxes2[..., 2] - boxes2[..., 0]) * (boxes2[..., 3] - boxes2[..., 1])\n",
        "\n",
        "    left_up       = np.maximum(boxes1[..., :2], boxes2[..., :2])\n",
        "    right_down    = np.minimum(boxes1[..., 2:], boxes2[..., 2:])\n",
        "\n",
        "    inter_section = np.maximum(right_down - left_up, 0.0)\n",
        "    inter_area    = inter_section[..., 0] * inter_section[..., 1]\n",
        "    union_area    = boxes1_area + boxes2_area - inter_area\n",
        "    ious          = np.maximum(1.0 * inter_area / union_area, np.finfo(np.float32).eps)\n",
        "\n",
        "    return ious"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "b82006d4",
      "metadata": {
        "id": "b82006d4"
      },
      "source": [
        "### Non Max Suppression: select the most appropriate bounding box\n"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "0616b442",
      "metadata": {
        "id": "0616b442"
      },
      "source": [
        "It is the process of taking the boxes with maximum probability and suppressing the near-by boxes with non-max probabilities.  \n",
        "This is process is repeated until for a single object-class only one box is remaining."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "baec34ba",
      "metadata": {
        "id": "baec34ba"
      },
      "outputs": [],
      "source": [
        "def nms(bboxes, iou_threshold, sigma=0.3, method='nms'):\n",
        "    \"\"\"\n",
        "    :param bboxes: (xmin, ymin, xmax, ymax, score, class)\n",
        "    Note: soft-nms, https://arxiv.org/pdf/1704.04503.pdf\n",
        "          https://github.com/bharatsingh430/soft-nms\n",
        "    \"\"\"\n",
        "    classes_in_img = list(set(bboxes[:, 5]))\n",
        "    best_bboxes = []\n",
        "\n",
        "    for cls in classes_in_img:\n",
        "        cls_mask = (bboxes[:, 5] == cls)\n",
        "        cls_bboxes = bboxes[cls_mask]\n",
        "\n",
        "        while len(cls_bboxes) > 0:\n",
        "            max_ind = np.argmax(cls_bboxes[:, 4])\n",
        "            best_bbox = cls_bboxes[max_ind]\n",
        "            best_bboxes.append(best_bbox)\n",
        "            cls_bboxes = np.concatenate([cls_bboxes[: max_ind], cls_bboxes[max_ind + 1:]])\n",
        "            iou = bboxes_iou(best_bbox[np.newaxis, :4], cls_bboxes[:, :4])\n",
        "            weight = np.ones((len(iou),), dtype=np.float32)\n",
        "\n",
        "            assert method in ['nms', 'soft-nms']\n",
        "\n",
        "            if method == 'nms':\n",
        "                iou_mask = iou > iou_threshold\n",
        "                weight[iou_mask] = 0.0\n",
        "\n",
        "            if method == 'soft-nms':\n",
        "                weight = np.exp(-(1.0 * iou ** 2 / sigma))\n",
        "\n",
        "            cls_bboxes[:, 4] = cls_bboxes[:, 4] * weight\n",
        "            score_mask = cls_bboxes[:, 4] > 0.\n",
        "            cls_bboxes = cls_bboxes[score_mask]\n",
        "\n",
        "    return best_bboxes"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "a01bb4ca",
      "metadata": {
        "id": "a01bb4ca"
      },
      "source": [
        "### Load class name from a file"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "f7e95103",
      "metadata": {
        "id": "f7e95103"
      },
      "source": [
        "- 80 different classes: person, bicycle, car, motorbike, aeroplane, bus, train, truck, etc."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "dbced12b",
      "metadata": {
        "id": "dbced12b"
      },
      "outputs": [],
      "source": [
        "def read_class_names(class_file_name):\n",
        "    names = {}\n",
        "    with open(class_file_name, 'r') as data:\n",
        "        for ID, name in enumerate(data):\n",
        "            names[ID] = name.strip('\\n')\n",
        "    return names"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "f10420fe",
      "metadata": {
        "id": "f10420fe"
      },
      "source": [
        "### Output an image with all the bounding boxes"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "542f85d8",
      "metadata": {
        "id": "542f85d8"
      },
      "source": [
        "Below function is an helper function to draw the bounding box in the input image."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "def814e0",
      "metadata": {
        "id": "def814e0"
      },
      "outputs": [],
      "source": [
        "def draw_bbox(image, bboxes, classes=read_class_names(\"coco.names\"), show_label=True):\n",
        "    \"\"\"\n",
        "    bboxes: [x_min, y_min, x_max, y_max, probability, cls_id] format coordinates.\n",
        "    \"\"\"\n",
        "\n",
        "    num_classes = len(classes)\n",
        "    image_h, image_w, _ = image.shape\n",
        "    hsv_tuples = [(1.0 * x / num_classes, 1., 1.) for x in range(num_classes)]\n",
        "    colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples))\n",
        "    colors = list(map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)), colors))\n",
        "\n",
        "    random.seed(0)\n",
        "    random.shuffle(colors)\n",
        "    random.seed(None)\n",
        "\n",
        "    for i, bbox in enumerate(bboxes):\n",
        "        coor = np.array(bbox[:4], dtype=np.int32)\n",
        "        fontScale = 0.5\n",
        "        score = bbox[4]\n",
        "        class_ind = int(bbox[5])\n",
        "        bbox_color = colors[class_ind]\n",
        "        bbox_thick = int(0.6 * (image_h + image_w) / 600)\n",
        "        c1, c2 = (coor[0], coor[1]), (coor[2], coor[3])\n",
        "        cv2.rectangle(image, c1, c2, bbox_color, bbox_thick)\n",
        "\n",
        "        if show_label:\n",
        "            bbox_mess = '%s: %.2f' % (classes[class_ind], score)\n",
        "            t_size = cv2.getTextSize(bbox_mess, 0, fontScale, thickness=bbox_thick//2)[0]\n",
        "            cv2.rectangle(image, c1, (c1[0] + t_size[0], c1[1] - t_size[1] - 3), bbox_color, -1)\n",
        "            cv2.putText(image, bbox_mess, (c1[0], c1[1]-2), cv2.FONT_HERSHEY_SIMPLEX,\n",
        "                        fontScale, (0, 0, 0), bbox_thick//2, lineType=cv2.LINE_AA)\n",
        "            print('{} detected in frame with {}% probability '.format(classes[class_ind], score*100))\n",
        "        \n",
        "    return image"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "9aff6a84",
      "metadata": {
        "id": "9aff6a84"
      },
      "source": [
        "### Load the anchors from a file"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "b1ca5c16",
      "metadata": {
        "id": "b1ca5c16"
      },
      "source": [
        "The predefined anchors for the data-set on which the YoloV4 model was trained are loaded."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "c2e8b7fc",
      "metadata": {
        "id": "c2e8b7fc"
      },
      "outputs": [],
      "source": [
        "def get_anchors(anchors_path, tiny=False):\n",
        "    with open(anchors_path) as f:\n",
        "        anchors = f.readline()\n",
        "    anchors = np.array(anchors.split(','), dtype=np.float32)\n",
        "    return anchors.reshape(3, 3, 2)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "410bbc8d",
      "metadata": {
        "id": "410bbc8d"
      },
      "source": [
        "## Inference"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "e440c091",
      "metadata": {
        "id": "e440c091"
      },
      "source": [
        "### Create a session for inference based on the device selected"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "acb0a5ab",
      "metadata": {
        "id": "acb0a5ab"
      },
      "source": [
        "Inferencing using OpenVINO Execution provider under ONNX-Runtime, is performed using the below simple steps:\n",
        "    \n",
        "1. Create a ONNX Runtime Session Option instance using `onnxruntime.SessionOptions()`\n",
        "2. Using the session options instance create a Inference Session object by passing the model and the execution provider as arguments.\n",
        "Execution Providers are the hardware device options e.g. CPU, GPU, NPU on which the session will be executed.\n",
        "\n",
        "The below `create_sess` function actually takes care of the above steps. All we need to do is pass the device arguement to it. It'll return the appropriate session according to the selected device along with the input name for the model.\n",
        "\n",
        "The device option should be chosen from any one of the below options:  \n",
        "- `cpu, CPU, GPU, NPU`"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "8b8ceece",
      "metadata": {
        "id": "8b8ceece"
      },
      "outputs": [],
      "source": [
        "def create_sess(device):\n",
        "  so = rt.SessionOptions()\n",
        "  so.log_severity_level = 3\n",
        "  if (device == 'cpu'):\n",
        "    print(\"Device type selected is 'cpu' which is the default CPU Execution Provider (MLAS)\")\n",
        "    #Specify the path to the ONNX model on your machine and register the CPU EP\n",
        "    sess = rt.InferenceSession(model, so, providers=['CPUExecutionProvider'])\n",
        "  elif (device == 'CPU' or device == 'GPU' or device == 'NPU'):\n",
        "    #Specify the path to the ONNX model on your machine and register the OpenVINO EP\n",
        "    sess = rt.InferenceSession(model, so, providers=['OpenVINOExecutionProvider'], provider_options=[{'device_type' : device}])\n",
        "    print(\"Device type selected is: \" + device + \" using the OpenVINO Execution Provider\")\n",
        "    '''\n",
        "    other 'device_type' options are: (Any hardware target can be assigned if you have the access to it)\n",
        "    'CPU', 'GPU', 'NPU'\n",
        "    '''\n",
        "  else:\n",
        "    raise Exception(\"Device type selected is not [cpu, CPU, GPU, NPU]\")\n",
        "\n",
        "  # Get the input name of the model\n",
        "  input_name = sess.get_inputs()[0].name\n",
        "\n",
        "  return sess, input_name"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "94bbfc56",
      "metadata": {
        "id": "94bbfc56"
      },
      "source": [
        "### Specify the path to anchors file on your machine"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "ef9d3383",
      "metadata": {
        "id": "ef9d3383"
      },
      "outputs": [],
      "source": [
        "ANCHORS = \"yolov4_anchors.txt\"  \n",
        "STRIDES = [8, 16, 32]\n",
        "XYSCALE = [1.2, 1.1, 1.05]\n",
        "ANCHORS = get_anchors(ANCHORS)\n",
        "STRIDES = np.array(STRIDES)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "c66ab52f",
      "metadata": {
        "id": "c66ab52f"
      },
      "source": [
        "### Specify the path to model, and image"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "7c968df7",
      "metadata": {
        "id": "7c968df7"
      },
      "outputs": [],
      "source": [
        "model = \"yolov4.onnx\"\n",
        "image = \"cat.jpg\""
      ]
    },
    {
      "cell_type": "markdown",
      "id": "00e6b53e",
      "metadata": {
        "id": "00e6b53e"
      },
      "source": [
        "### Validate model file path"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "3031833b",
      "metadata": {
        "id": "3031833b"
      },
      "outputs": [],
      "source": [
        "check_model_extension(model)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "d446b2cf",
      "metadata": {
        "id": "d446b2cf"
      },
      "source": [
        "### Setup input and output"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "3bff0f18",
      "metadata": {
        "id": "3bff0f18"
      },
      "outputs": [],
      "source": [
        "if (image):\n",
        "    # Open the image file\n",
        "    if not os.path.isfile(image):\n",
        "        print(\"Input image file \", image, \" doesn't exist\")\n",
        "        sys.exit(1)\n",
        "    cap = cv2.imread(image)\n",
        "    output_file = image[:-4]+'_yolov4_out_py.jpg'"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "83da97e8",
      "metadata": {
        "id": "83da97e8"
      },
      "source": [
        "### Run the inference with CPU Execution Provider <a name=\"cpu_exec\"></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "0b265eee",
      "metadata": {
        "id": "0b265eee"
      },
      "source": [
        "Now the `yolov4.onnx` model will run inference on `cat.jpg` image using the below two execution providers:\n",
        "- `cpu`: default CPU Execution Provider (MLAS)  \n",
        "- `CPU`: Execution on CPU with OpenVino Execution Provider\n",
        "\n",
        "The below code block performs the following operations:\n",
        "\n",
        "1. Creates a Onnx Runtime Session using `cpu` as device\n",
        "2. Loads the input image and performs pre-processing using `image_preprocess` function\n",
        "3. Performs prediction on the same image for $100$-times\n",
        "4. Calculates average inference time\n",
        "5. Performs post-processing, non-max suppression & bounding-box drawing using the predicted outputs from the model\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "7e2091e9",
      "metadata": {
        "id": "7e2091e9"
      },
      "outputs": [],
      "source": [
        "### create a session with cpu which is the default CPU Execution Provider\n",
        "sess, input_name = create_sess(\"cpu\")\n",
        "\n",
        "input_size = 416\n",
        "original_image = cap.copy()\n",
        "original_image = cv2.cvtColor(original_image, cv2.COLOR_BGR2RGB)\n",
        "original_image_size = original_image.shape[:2]\n",
        "\n",
        "image_data = image_preprocess(np.copy(original_image), [input_size, input_size])\n",
        "image_data = image_data[np.newaxis, ...].astype(np.float32)\n",
        "\n",
        "outputs = sess.get_outputs()\n",
        "output_names = list(map(lambda output: output.name, outputs))\n",
        "\n",
        "print(\"PREDICTION - BEGIN\")\n",
        "#warmup\n",
        "sess.run(output_names, {input_name: image_data})\n",
        "\n",
        "start = time.time()\n",
        "for i in range(100):\n",
        "  detections = sess.run(output_names, {input_name: image_data})\n",
        "\n",
        "end = time.time()\n",
        "inference_time = end - start\n",
        "print('Avg Inference time in ms: %f' % (inference_time/100 * 1000))\n",
        "print(\"PREDICTION - END\")  \n",
        "\n",
        "pred_bbox = postprocess_bbbox(detections)\n",
        "bboxes = postprocess_boxes(pred_bbox, original_image_size, input_size, 0.25)\n",
        "bboxes = nms(bboxes, 0.213, method='nms')\n",
        "image_out = draw_bbox(original_image, bboxes)\n",
        "\n",
        "cv2.putText(image_out,\"cpu\",(10,20),cv2.FONT_HERSHEY_COMPLEX,0.5,(255,255,255),1)\n",
        "\n",
        "image_out = cv2.cvtColor(image_out, cv2.COLOR_BGR2RGB)\n",
        "cv2_imshow(image_out)\n",
        "\n",
        "#Write the frame with the detection boxes\n",
        "cv2.imwrite(output_file, image_out.astype(np.uint8))\n",
        "\n",
        "print('\\n')"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "2a86773c",
      "metadata": {
        "id": "2a86773c"
      },
      "source": [
        "### Run the inference **with** OpenVINO Execution Provider"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "8c04daa0",
      "metadata": {
        "id": "8c04daa0"
      },
      "source": [
        "The below code block performs the same opertions as [before](#cpu_exec) with `CPU` as device, that runs on OpenVINO Execution Provider for ONNX Runtime."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "a72e6bf7",
      "metadata": {
        "id": "a72e6bf7"
      },
      "outputs": [],
      "source": [
        "### create a session with CPU_FP32 using the OpenVINO Execution Provider\n",
        "sess, input_name = create_sess(\"CPU\")\n",
        "\n",
        "input_size = 416\n",
        "original_image = cap.copy()\n",
        "original_image = cv2.cvtColor(original_image, cv2.COLOR_BGR2RGB)\n",
        "original_image_size = original_image.shape[:2]\n",
        "\n",
        "image_data = image_preprocess(np.copy(original_image), [input_size, input_size])\n",
        "image_data = image_data[np.newaxis, ...].astype(np.float32)\n",
        "\n",
        "outputs = sess.get_outputs()\n",
        "output_names = list(map(lambda output: output.name, outputs))\n",
        "\n",
        "print(\"PREDICTION - BEGIN\")\n",
        "#warmup\n",
        "sess.run(output_names, {input_name: image_data})\n",
        "\n",
        "start = time.time()\n",
        "for i in range(100):\n",
        "  detections = sess.run(output_names, {input_name: image_data})\n",
        "\n",
        "end = time.time()\n",
        "inference_time = end - start\n",
        "print('Avg Inference time in ms: %f' % (inference_time/100 * 1000))\n",
        "print(\"PREDICTION - END\")  \n",
        "\n",
        "pred_bbox = postprocess_bbbox(detections)\n",
        "bboxes = postprocess_boxes(pred_bbox, original_image_size, input_size, 0.25)\n",
        "bboxes = nms(bboxes, 0.213, method='nms')\n",
        "image_out = draw_bbox(original_image, bboxes)\n",
        "\n",
        "cv2.putText(image_out,\"CPU\",(10,20),cv2.FONT_HERSHEY_COMPLEX,0.5,(255,255,255),1)\n",
        "\n",
        "image_out = cv2.cvtColor(image_out, cv2.COLOR_BGR2RGB)\n",
        "cv2_imshow(image_out)\n",
        "\n",
        "print('\\n')\n",
        "\n",
        "#Write the frame with the detection boxes\n",
        "cv2.imwrite(output_file, image_out.astype(np.uint8))\n",
        "\n",
        "print('\\n')"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "collapsed_sections": [],
      "name": "OVEP_yolov4_obj_detection_sample.ipynb",
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3.8.10 ('yolov4': venv)",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8.10"
    },
    "vscode": {
      "interpreter": {
        "hash": "7c4a41788a1f5a54de98941c88996114c38916598df611346d841c8cb6d26f64"
      }
    }
  },
  "nbformat": 4,
  "nbformat_minor": 5
}
